Skip to content

Conversation

@florian-lefebvre
Copy link
Member

@florian-lefebvre florian-lefebvre commented Oct 21, 2025

Todo

  • validate approach
  • code
  • todos
  • check compat between v5 loaders and Astro v6
  • tests
  • docs
  • changeset
  • pr desc

Changes

Testing

Docs

@florian-lefebvre florian-lefebvre added this to the v6.0.0 milestone Oct 21, 2025
@florian-lefebvre florian-lefebvre self-assigned this Oct 21, 2025
@changeset-bot
Copy link

changeset-bot bot commented Oct 21, 2025

⚠️ No Changeset found

Latest commit: 771939a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot added the pkg: astro Related to the core `astro` package (scope) label Oct 21, 2025
@florian-lefebvre
Copy link
Member Author

@ascorbic this is very WIP but can you have a look at the code and let me know what you think about the approach?

@florian-lefebvre florian-lefebvre linked an issue Oct 28, 2025 that may be closed by this pull request
Copy link
Contributor

@ascorbic ascorbic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the scope of what you are trying to fix here? As I understand it, there are a couple of issues that could be fixed by running the schema parse later:

  • resolve references
  • support refine etc with image helper (and generally improve the image handling)
  • don't require loader authors to manually parse data

Is the goal to fix both of these, or soemthing else?

Comment on lines 568 to 573
new Traverse(data).forEach((_, val) => {
if (typeof val === 'string' && val.startsWith(IMAGE_IMPORT_PREFIX)) {
const src = val.replace(IMAGE_IMPORT_PREFIX, '');
foundAssets.add(src);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing this two-pass approach, could we find a better way to handle images? This has always felt like a hack, but was needed because of the time when it had to run

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I chose the simplest approach possible for this POC by simply copying

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and I don't think the new approach would allow to get rid of this. Basically during the first pass we don't know which properties are images so we can't store them for processing before the 2nd pass IIUC

@florian-lefebvre
Copy link
Member Author

florian-lefebvre commented Nov 4, 2025

So the goal here is to support returning a schema from load() instead of having it as a top level property. AFAIK this means:

  • parseData() cannot, well, parse data anymore because the schema may not be known already. We only get the final schema after running load()
  • set() / get() etc must still work on unparsed data

So my solution to this problem was to have an intermediary store. It contains the initial parsed data and the loader can manipulate it by calling any method. Then once load() is done, we replay events on the real data store.

All the potential improvements you mentionned are basically unintended and out of scope of this PR (except parseData())

@ascorbic
Copy link
Contributor

ascorbic commented Nov 4, 2025

It seems to me that if we're going to fix this, we should at the very least ensure this is designed this in such a way that the others can be fixed too, even if we don't fix them immediately

@florian-lefebvre
Copy link
Member Author

Sure. Can you give me more info about the first 2 bullet points? Like how it works currently and why, and what improvement you'd like

@ascorbic
Copy link
Contributor

ascorbic commented Nov 4, 2025

We currently can't check references because other collections may not have been loaded. If doing two passes, we can check that referenced entries exist. This should be a relatively easy win.

With images, because of the way we do this placeholder thing and then process images later, we can't return useful metadata from the helper anymore, meaning refine() doesn't work. I'm not sure if this is possible to fix with this, because it may still be too early in the build, but it's worth looking at.

@florian-lefebvre
Copy link
Member Author

@ascorbic I pushed a POC for early references validation, let me know what you think

Comment on lines 338 to 353
new Traverse(data).forEach((ctx, value) => {
if (!isReference(value)) {
return;
}
// TODO: better data structure
const collectionStore = filteredRawLoaderResults.find(
(e) => e.name === value.collection,
)!.memoryStore;
const id = 'id' in value ? value.id : value.slug;
if (!collectionStore.has(id)) {
// TODO: AstroError
throw new Error(
`Invalid reference for ${name}.${ctx.keys!.map((key) => key.toString()).join('.')}: cannot find entry ${id}`,
);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to do this inside the schema validation? Traversing always seems a bit of a hack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. If reference was passed like image() we could have some context in there but AFAIK with the current implementation there's not much we can do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point. I forgot it was handled like that. Now I'm trying to remember why it didn't need to traverse before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's because reference() is actually never checked? It would only throw at runtime when calling eg. getEntry(post.data.author)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pre-content layer it did check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah definitely, I didn't understand that's what you were talking about

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I looked and that's because createReference was passed the lookupMap

const entry = store.get(collection, lookup);

@florian-lefebvre
Copy link
Member Author

I think the POC works fine so now I'll work on cleaning everything up, tests, docs etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pkg: astro Related to the core `astro` package (scope)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecate loader types generation from schema

3 participants